对于机器人来说,在人口稠密地区的自主航行仍然是一项艰巨的任务,因为难以确保在非结构化情况下与行人进行安全互动。在这项工作中,我们提出了一个人群导航控制框架,该框架可在自动驾驶汽车上提供连续避免障碍物和接触后控制。我们建议评估指标,以了解自然人群中的会计效率,控制器响应和人群相互作用。我们报告了不同人群类型的110多种试验的结果:稀疏,流量和混合流量,低 - (<0.15 ppsm),中部(<0.65 ppsm)和高 - (<1 ppsm)的行人密度。我们提出了两种低级避免障碍方法与共享控制基线之间的比较结果。结果表明,在最高密度测试上,相对时间下降了10%,没有其他效率度量降低。此外,自主导航显示与共享控制导航相当,相对混蛋较低,命令的流利度明显更高,表明与人群的兼容性很高。我们得出的结论是,反应性控制器履行了对人群导航的快速和连续适应的必要任务,并且应该与高级计划者一起以进行环境和情境意识。
translated by 谷歌翻译
移动操纵器投掷是一种有前途的方法,可以提高工厂动态操纵的灵活性和效率。其主要挑战是在一系列任务规格下有效地计划可行的投掷。我们分析了投掷问题,并表明可以将其简化为更简单的平面问题,从而大大降低了计算成本。使用数据分析和机器学习,我们构建了对象的倒飞行动力学和机器人的运动可行性的模型,该模型可以在给定目标位置查询的1 ms中投掷运动。由于我们方法的计算效率,我们表明,在执行任务执行期间受到干扰时,系统是自适应的,是通过即时进行重新启动以找出替代投掷而不是坚持原始计划。代码可在以下网址找到:https://github.com/liuuyangdh/mobile-throwing
translated by 谷歌翻译
本文提出了一种新的方法,以学习由动态系统驱动的稳定机器人控制法。该方法需要单个演示,并可以在任意高维度中推断出稳定的动力学。该方法依赖于存在一个潜在空间的想法,非线性动力学出现准线性。原始的非线性动力学通过利用图形嵌入的属性来映射到稳定的线性DS中。我们表明,图laplacian的特征分类导致在二维中的线性嵌入,并在较高维度中进行准线性。非线性术语消失,随着数据点数的增加而呈指数呈指数化,并且对于较大的点密度,嵌入似乎是线性的。我们表明,这种新的嵌入能够在高维度上建模高度非线性动力学,并以重建精度和嵌入所需的参数数量克服替代技术。我们证明了它的适用性,以控制负责在空间中执行复杂自由运动的实际机器人。
translated by 谷歌翻译
近几十年来,技术进步使得可以收集大数据集。在这种情况下,基于模型的群集是一种非常流行的,灵活和可解释的方法,用于在明确定义的统计框架中进行数据探索。大型数据集的增加之一是缺失值更频繁。但是,传统方式(由于丢弃具有缺失的值或估算方法的观察)不是为聚类目的而设计的。此外,它们很少适用于常规情况,虽然在实践中频繁地缺失,但是当缺失取决于未观察到的数据值时,缺失就缺失(mnar)值,而且可能在观察到的数据值上。本文的目标是通过直接在基于模型的聚类算法内嵌入MNAR数据来提出一种新的方法。我们为数据和缺失数据指示器的联合分布进行了选择模型。它对应于数据分布的混合模型和缺失数据机制的一般Mnar模型,其可以取决于底层类(未知)和/或缺失变量本身的值。导出大量有意义的MNAR子模型,对每个子模型研究了参数的可识别性,这通常是任何MNAR提案的关键问题。考虑EM和随机EM算法估计。最后,我们对合成数据的提议子模型进行了实证评估,我们说明了我们的方法对医疗寄存器的方法,创伤者(R)数据集。
translated by 谷歌翻译
We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence. These attributes are of interest because we do not have a concrete visual definition of what they entail. What does it look like for a dog to be more or less memorable? GANs allow us to generate a manifold of natural-looking images with fine-grained differences in their visual attributes. By navigating this manifold in directions that increase memorability, we can visualize what it looks like for a particular generated image to become more or less memorable. The resulting "visual definitions" surface image properties (like "object size") that may underlie memorability. Through behavioral experiments, we verify that our method indeed discovers image manipulations that causally affect human memory performance. We further demonstrate that the same framework can be used to analyze image aesthetics and emotional valence. Visit the GANalyze website at http://ganalyze.csail.mit.edu/.
translated by 谷歌翻译
Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. We evaluate TRN-equipped networks on activity recognition tasks using three recent video datasets -Something-Something, Jester, and Charades -which fundamentally depend on temporal relational reasoning. Our results demonstrate that the proposed TRN gives convolutional neural networks a remarkable capacity to discover temporal relations in videos. Through only sparsely sampled video frames, TRN-equipped networks can accurately predict human-object interactions in the Something-Something dataset and identify various human gestures on the Jester dataset with very competitive performance. TRN-equipped networks also outperform two-stream networks and 3D convolution networks in recognizing daily activities in the Charades dataset. Further analyses show that the models learn intuitive and interpretable visual common sense knowledge in videos 1 .
translated by 谷歌翻译
We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.
translated by 谷歌翻译
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is remarkably close to the 34.2% top-5 error achieved by a fully supervised CNN approach. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them.
translated by 谷歌翻译